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Introduction : 

The position of an CTiergency transmitter rray be determined by measuring 
the Doppler shift of the distress signal as received by an orbiting satellite. 
Tliis requires the ccnputation of an initial estirrate and refinement of this 
estinate through an iterative, nonlinear, least-squares estimation. 

A version of the above algoritlira was inplemented at Goddard Space Flight 
Center (GSFC) and tested by locating a transmitter on the pronises and obtaining 
observations from a satellite. The coirputer used was an IBM 360/95. Ihe po- 
sition was determined within the desired 10 km radius accuracy. 

The purpose of this project is to determine the feasibility of performing 
the same task in real tine using microprocessor technology. The, least square 
algorithm was irtplenented on an Intel 8080 microprocessor and the sane e:^)eri- 
nent was run as at GSFC. 


The resirLts indicate that a microprocessor can easily match the IBM im- 


plerrentation in accuracy and be performed inside the time limitations set. 
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Why Microprocessors; 


Tiire is an iirplicit restriction in any search and rescue mission. The 
xise of satellites and ccsrputers is dictated by that tine limit. The use of 
a big oonpnter to determine the position presipposes ccnmunication between the 
satellite and the conputer. This oorrtnunication introduces a time delay since 
the satellite is not always within radio visibility of an installation that 
possesses both the communication and ccnputing pov«r for this problem. Fur- 
thermore the result has to be forwarded bo a comtend center to do the dispatching. 

Microprocessor utilization can alleviate this sitration in two ways; 
by giving cheap coiputing power to ccranunication facilities or by incorporating 
the conputing power in the satellite itself thus eliminating this communication 
coiiplebely. 

Microprocessors offer light weight, srrall volume, low power processing. 

Their speed is improving rapidly and their cost is going down. They are the 
logical choice for a satellite search and rescue system if th^ can perform. 



Machine Configuration; 


Stxictly speaking there ape three microprocessor configurations in 
this project vihich ve are going to discuss individually. 

• Developnent system 

' Mininel execution system 

• Actual field configuration 

Initially our developn^t system consisted of an ME)S-80 Intellec micro- 
conputer hy Intel with 16k bytes of EAM memory and a resident EDM monitor. 

Most of the floating point paclcage was developed in nachine language on that 
system using the noni tor’s limited hexadecimal editor and (M^ugger. The need 
for more sophistication became apparent. After several fsalures in exploring 
alternatives (as fancy as hooking up to a PDF 11 through a telephone line 
for more storage) were able to acquire a dual floppy disk drive by Intel. 

A spare line printer was attached to the system with minor hardware modifica- 
tions and 16k bytes more RAM were added in order to support: DOS. The enhanced 
system had the power of a mini-conputer in software (assembler, editor, li- 
brary nenager, linkage editor, leader, and a sufficient file manager) at a 
speed v^ch was slew but acceptable. The floating point package was converted 
to assembly language, and two more packages were developed; the I/O package 
and the matrix iranipulatlon package. Unexpected help cane from the use of 
ICE-80 (In Circuit Emulator) , designed for a different application, as a 
pcx'jerful symbolic debugger substituting for the monitor hexadeciital debugger. 

Out of this final version of the development system only a limited 
amount of resources were used for the final run. Those define the minimal 
execution syston. -The disk was only used for input of data. The essential 
parts were: 

• The CPU card 

• 16k Bytes of memory 



• The console device and its interface 

• Fewer si^jply: 12V, 5V, -5V, ground 

Additionally, the line printer was used to produce a hardcopy version of 
the results. 

The actual field configuration would be the same if the machine v/ere 
located on the ground. Soire kind of ooinmunications equipment vjould be re- 
quired to provide the data input and, itaybe, start the run automatically. 

The configuration would be different, though, if the machine were located on 
the satellite. The requirenents for the satellite configuration would be: 

• The CPU card 

• 16k bytes of iremory 

An interface that can load the information in memory 

• A means to conmunicate the result to the world 


• Power sv53ply: 12V, 5V, -5V, ground 



The Floating Point Package: 


Based on estimates of the number of operations required we were in- 
clined to think that any floating point operations would have to be performed 
by hardware and not ty software since estirtated tines became prohibitive. 

This floating point package was developed to help us count the actual number 
of operations rather than perform them in an actual situation. The final run 
proved our estinates wrong and the package gained new iirportance. 

There are a nuitiber of representations of floating poing nimbers dif- 
fering in accuracy and range as a trade off to the number of bytes required 
per number. . The one used was the ANSI fomat for FORTRAN which happens to 
be ittpleirented hy hardware as an option in IBM conputers. It consists of one 
sign bit, a seven bit ejponent (excess 64 ) , and a 24 bit irantissa of hexa- 

I 

deciml digits. The accuracy is 6 hexadecimal digits or approximately 7.2 
decitral digits. Specific operations were not timed although a more general 
timing analysis appears in a later section. This fornat was chosen as opposed 
to the BCD foniQt because the space requirements are laver for the same 
amount of precision, vhich in turn reduces execution time slightly. A man- 
tissa of binary digits was not used because of the frequent need for nomali- 
zation. 

Addition and subtraction take exactly the same time, x^ereas multipli- 
cation is approxiirately equal to 22 addition and division is approximately 
60 additions. 

Multiplication produces a 48 bit result rrantissa which is then normalized 
and rounded to 24 bits. This preserved -the number of significant digits, or, 
viewed from a different angle, is the same as a double precision multiply if 
the two arguments were expanded with zero fill. 

Division preserves the significant digits again fcy ejpanding the man- 
tissa of the dividend to double precision and results in full single preci- 
sion result. Nbntalization and rounding occur as in multiplication. 



Accuracy is thus preserved to true single precision throughout in a 
numerically stable nenner keeping the length of the number to 4 bytes. The 
cost is ej^^ensive multiplication and, expeciallu, division. This dictates 
a programming style whereby division is avoided unless it is absolutely neces- 
sary. The benefits, on the other hand, are numerically stable inpleinenta- 
tions vdiose results natch the double precision to the extent possible as 
will be seen vihen the results of the run are analyzed. 

The square root function was inplenented by using a variation of Heron's 
formula based on the observation* that the mantissa of any floating point nun>- 
ber will have a value of 1/16 to 1 (interpreted as a fraction) . As a first 
guess an aEproxination to a strai^t line connecting the two end points is 
made. Experimentally, six iterations were found necessary to produce an 
accurate result. A better first guess could inprove that significantly, but 
time constraints did not allow us to pursue that direction. 

Finally, irput and output of flea. ting point numbers turn out a much 
more serious task than first ej^sected. The input routine reco gnize d numbers 
with a naximum of ten integer and ten fraction digits. This proved more 
than sufficient for our needs. The output routine prodvKjes a rigid scienti- 
fic fonrat with 10 fraction digits. When interpreting the results it should 
be kept in mind that at most only 7 are significant. The format wcis re- 
tained in case of future expansion of the nentissa. Both the input and output 
routines could be better, but since their function is only tangential to the 
project at hand they viere kept on the bare functional level. 



MatrijK Operatj.ons : - 

All natxices in the system are defined as two-dimensional, including 
vectors. The first two bytes ccsitain the number of rows and the nuirber of 
columns in the particular mtrix, respectively. This effectively limits the 
number of observations to 256. Vectors have one of their diirensions identi- 
cally equal to 1. The nSxt two bytes car tain the address of the first byte 
that follows the last byte belonging to the matrix. Adjacent elements in 
a row of the iratrix are stored as adjacent floating point numbers in itemory. 
Rows are stored sequentially starting from the first row in the fifth byte. 

In an effort to minimize the number of address calculations 'in the least 
squares algorithm the APL program we were supplied with, (LSQ) , was converted 
into FORTRAN. The calculations involved in the residual equations were all 
grouped together inside one big loop. The advantage of such a scheme is that 
once an offset is calculated it can be used to address all the needed elements 
of the no.trices involved in the calculation. When the tine came though, to 
inplesrent it using 8080 assembly language, it became all too app^ent that 
there were too itany addresses to keep track of and too few registers to help. 
Therefore, due to the limitation of addressing capabilities, routines were 
inpleirented for the various iratrix operators in APL. This resulted in veil . 
structured and very efficient code, the style being dictated by the instruc- 
tion set. 

A minimum number of iratrix utility routines was necessary. Matrices can 
be created by specifying their diirensions, they can be filled with zeros, they 
can be read from a device, they can be moved (copied) in storage. 

There are four classes of operations by 'vdiich iratrices my be altered 
involving the following arguments. 

• a constant and a matrix 

• a vector and a matrix 



• two inatric^s (plus possibly a result matrix) 

♦ one mtrix (for exartple, inversion) 


In our particular application there was only one inversion of a 2 by 2 
natrix involved. A sirtple algorithm derived from Euler’s method is inple- 
mented using fixed pivots. Execution time and tenporary storage are opti- 
mized. 



Inplementing the E^qperjument: 


Having developed tlie tools that were discussed in previous sections the 
actual iirpleinentation was straight forward. For reasons already nentioned a 
routine was written to natch the LSQ routine* developed by Dr. >5arini almost 
statement by stateitent. The correspondence is indicated in the source pro- 
gram by keeping track of the APL statement numbers. The array nanas were 
kept the saira as much as possible and only one additional bertporary natrix 
was lequired, Ihe program was written for a maximum of 100 observations. 

All netrix operations as well as the square root keep track of the calls to 
the floating point routines. 

The whole package nekes limited use of two monitor routines, vhich can 
easily be eliminated. The reason they are there is because software was 
being developed in mchine language and the monitor provided a lot of needed 
help. So, essentially, LSQ can be run conpletely independently. 

The space requiranents for this particular run was appixiximately 16k 
bytes, out of vhich 4k could be in RDM and 12k in KAM. The exact numbers 
are as follows: 

Code: 3656 bytes 

Data: 10365 bytes 

Stack: 100 bytes (arbitrarily) 

Total: 14121 bytes 

Incorporated into the package were four counting routines that kept 
track of the number of additions, subtractions,, multiplications and divisions 
required during each iteration. The results will be analyzed in the next sec- 
tion. The actual iirplCTentation would not require' these' routines. Uie counting 
'overhead to each arithiretic operation is approxirrately equal to half -the tine 
of an addition. 


* See J^pendix C. 



Interpreting tlie Results; 


The final run converged and yielded five digits of accuracy. If conver- 
gence is defined as a ratio of two succesive KViS residuals being close to 1 
(in absolute) it vias attained at the ninth iteration to within lO. 00001. 
Ccnparing these results to the run at GSFC (run at double precision, or 16 
digits of accuracy) ws note the 5 digit accuracy of our result. 

Nunerical analysis gives us enough tools to justi^ the loss of two signi- 
ficant digits in the course of the iterations. The itain source of error ap- 
pears to be the subtraction of the estimated range rates frcra the actuals. 

The subtraction of the average residual equations could could contribute to 
the error as toII. 

The measured execution time for this particiiLar run was. 62 seconds per 
iteration. The microprocessor used was an 8080A by Intel; Mjusting for 
counting the number of operations the true time beccsnes 61 seconds. The 
8080A CPU has a cycle tiite of 2 microseconds. If this system were actually 
inplenented, the 8080A-1 CPU' coiild be used vdiich offers higher speed with 
cycle -tine of 13_ microseconds \diich could bring execution -tume down to 40 
seconds for each iteration giving approximately 6 minutes -to reach conver- 
gence. This figure is derived with no modification of the software. Since > 
it falls within our dif inition of "real -hime" , which was around 15 minutes , 
it is definitely a workable solution. 

Another alternative is, of course, -to use hardware floa-tdng point uni-ts. 
TWo units that we are familiar with indicate a dispari"ty in execution "times 
of several orders of iiagnitude. Their specifications appear in J^pendix B 
for the purposes of the following analysis, 'typical' execution times for 8 
digits of precision of the North Star Oonputers, Inc. IPB unit were used. 

Our system indicated the following frequency of floating point operations 
for each iteration; 


Additions - 3137 



Subtractions - SI 2 


Multiplications - 2382 
Divisions - 940 

When trying to ccHpute the time it would take to execute those instsruc— 
tions we noticed that the tiitie it takes to access hardware floating point 
unit is more than twice than the tiive it takes to do the calculations. Namely, 
we cane with the follo^drig numbers: 

TIMS (SEC) . ' PURPOSE 

0.35 perform the operations 

0.825 input and output the number form 

FPB (8080A-1) 

1.175 total time required 

Therefore, use of hardware units neke it possible to decrease the execu- 

tion time by one order of iragnitude. 



Future Eesearch: 


The paraneters that have to be optimized in the search and rescue 
mission consist of the accuracy of the position estinstion and the time in 
vihich it is performed. Proving the feasibility of a ndcroprocessor ‘ 
irrplenentation is far fron devising an optimal algorithm. 

If the nonlinear regression method is utilized there is a lot of 
room for iirprovement in the inital estimate, a quantity that can affect 
the whole outccsne of the iterations. Several methods that are suggested 
in Dr. Marini's paper can be explored. Furthermore, since the data 
collection takes an appreciable amount of time an algorithm should be devised 
in which an estiimte is upgraded with each incoming datum. If that algorithm 
is good enough tlien the estimate could be the result itself. 

A further enhancement on the calculation tiem can be achieved through 
parallelism. It can appear on two levels: 

* The inplementation of the least squares algorithm 

• The grouping of data 

The least s.quares algorithm iray be broken into parallel subtasks that 
can be perfonted by different processors in parallel, especially floating 
point operations. 

The data nay be grouped in clusters on which the least squares 
algorithm is applied. The estimate provided by each cluster is then processed 
through least squares estimation itself. This method could be applied at 
data collection-tine- too. 



J^pendix A 


Saitple run at GSFC 
Sanple run at Columbia 
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THE RESULT I UG POSITION IS. 

» -O, V- -O. TS^SOJiSSTOE+O-l 


RHS ■RESIDURLS = 9. 1154Ci5402E+01 


sample run at Columbia^' 
O. 4351.T60127E+04 


THE' PESULTlNG POSITIOri IS: ' 

>=:= 0. E21E947.555E+01 V= -0. 49i25i7547E+04 E- 9. 4i76S226S2E+04 


RR5T:ES I DURE^“S — B71.'05O582'0S7E+Gi 


rHE“F:ESUi:TING“POSTTION ISf” ' 

••:= 0. S4G4587211E+03 V= -0. 4829673:93rtE+04 

RMS~RESIfTTRES“^ GT3 lE89'477729E+00' 

( 

i 

\ * 

THE“R'ESOCrir^n=OSTTTDTQ”ISl 

0. 947Ei7ki.i:i2tiE+WE V— *~0. 4p.i8S.i.449t.'>E+y4 

RHS~P:ESIU'URirS“S — 070706402111 E+00 


01 . 40605 :E:t:: 536 E +04 


0. 4G3S94126SE+04 


THE“RESULT1 HG“PUSIT1GT^"'I ST 

‘k~ 0. ?.S7.'?1.506'30Et03 V= -6. 4S38407.'516E+O4 Z=-- 0. 402344SOS3E+e4 

I 

SHS ■ RES I DURL S ■^~0." O65079S31SE+90 . ’ 


rHE”RESUL:TT?iG"POSITION“IST~~’ ' 

0. 5999647140E+03 V= -£i. 4878373906E+04 E- 0. 40202EE1S4E+04 

RESir-URLS”= "0. 0647644424E+00 


THE' RESULTING 'POSITION IS': 

0. 10O2!E:33.393E+04 V= “0. 4£!3!E:373SS9E+04 


E~ 0. 4e:t94?3825E+04 


RMS" RESI DURLS"= — 0." 9S47490596E+00 


THE" PESUL.TIHG-POSITION IS: 

X= 0. 100I52096SE+04 V- -G. 4838E8462SE+e4 E" G 49193 2Ti2.S6E+G4 
RHS“RESIDURLS""^H — O." OG474S04S7E-1-O0 " 


THE' RESULTING POSITION IS: 

0. lULi3620l47E+04 V-- ~0. 4t‘Ey.i7:^'r'S?E'i't'4 

RMS REST DURL'S' =" ■0."O.64747S67SE+00 

I 

( 

‘ • . 

THE“RESULTTNG~POSITIO?riSr' , •• 

X= 0. 1003682136E+04 V- -8. 483S3S462SE+04 

RMS'"RESIDLIRLS' = ' '0. 8647482S72E+00 

J 

THE”F~ESULriNG'~POStTlOfJ IS'"' 

0. j 003’667£:31E+04 V— ~0. 483 837!?S:G9E+Ci4 

RMS RESIC'URLS = 0. 06474SZ 444E+00 


0. 40193.0G460E+G4 
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0, 4019283294E+04 


0. 40i9;'?09.24E404 



J^ppendix B 


-typical hardware Tloating poin-t imits 

’ PPB by North Star Ocstputers, Inc. 

• FPU tY Cy^beruetic Micro Systems 



FPB DATA SHEET 


EXECUTION TIMES 1.2,3 


PRECISION DIGITS: 

2 

4 

6 

8 

10 



12 

14 

ADD best 

■■ 

1 

1 

1 

1 

1 

■1 

typical 


mm 

9 

9 

10 

10 

m 

worst 

■■ 

HI 

10 

11 

11 

12 


SUBTRACT best 

4 

n 

4 

4 

n 

■1 

n 

typical 

8 

HI 

9 

9 



mm 

worst 

15 


17 

18 


mm 

mm 

MULTIPLY best 

5 

5 

5 

5 

5 

5 

5 

typical 

18 

34 

55 

80 

111 

146 

186 

worst 

51 

125 

228 

382 

527. 

720 

933 

DIVIDE best 

7 

7 

• 7 

7 

7 

n 

7 

typical 

39 

70 

109 

156 

211 


370 

worst 

62 

139 

229 

340 

470 

jj^gQII 

779 


1. Times given in microseconds 

2. Execution times are a function of the input values 

3. Times listed do not include transmission of input values and result 


Board dimensions: 

Model A: Bin. by 10 in. 

Model B: 6% in. by 12 in. 

Power requirements: 

Model A: 8 V (unregulated) @ 1.7 A 
Model B: 5 V (regulated) @ 1.7 A 

Board Construction: 

FR4 material, gold plated edge connectors 

Floating ptoint number representation: 

Byte 1 : bit 7=sign ( 1 =negative number, 0=positive number) 

bits 6-0 = exponent in excess 64 binary representation 
bits 7-0 = zero represents the zero value 
Byte 2: bits 3-0 = least significant digit of value in BCD coding 
bits 7-4 = next least significant digit of value 

Byte n: bits 7-4 = most significant digit of value in BCD coding 
bits 3-0 = next most significant digit of value 

All values are nomatized. 

Other representations of BCD floating point numbers require a change in microcode and are. available on 
special order. 






































*Samp!e use of the North Star FP8 for a divide opeiation with 8 digit precision 
*In this example assume arguments are in memory in form: 

* Most significant byte {msb) digit pair 

* Susequent digit pairs follow the msb 

* Exponent/sign byte follows Isb digit pair. 

* Pointer addresses the exponent/sign byte 
*BC has left arg pointer 

*DE has right arg pointer 
*HL has result pointer 

*The FPB receives its arguments by ''peeking” at the 8080 bus 
*when the argument values are loaded to accumulator. 

*Two jumperable "hardwired" addresses are required for signaling the FPB 


*This routine may be generalized to perform any operation, at any precision. 


FDIV LDA RSTRT 

MVl A.8*16+D1V0P 

LDAX D 

DCX D 

LDAX D 

DCX D 

LDAX D 

DCX D 

LDAX D 

DCX D 

LDAX D 

LDAX B 

DCX B 

LDAX B 


This "hardwired" reference signals FPB to "wake up" 

Specify precision and operation code to FPB 

Exponent/sign byte of right arg 

Advance pointer to next byte 

Least significant digit pair of right arg 

Advance pointer to next byte 


Most significant digit pair of right arg 
Exponent/sign byte of left arg 

Least significant digit pair of left arg 


DCX B 
LDAX B 
DCX B 
LDAX B 
DCX B 
LDAX B 

Now the Floating 
LXI D,FPDIN 
FDIV1 LDAX D 
ORA A 
JP FDIV1 
ANI EBITS 
LDAX D ■ 

MOV M,A ' 

DCX H 
LDAX D 
MOV M,A 
DCX H 
LDAX D 
MOV M,A 
DCX H 
LDAX D 
MOV M,A 
DCX H 
LDAX D 
MOV M,A 
RZ 

JMP ERROR 


Most significant digit pair of left arg 
Point Board is performing the operation 

"Hardwired" address for receiving value from FPB 
Loop waiting for completion signal (sign bit) 

The FPB is done when the sign bit becomes "1" 
Loop if sign bit is still "0" 

Check for error, condition tested at end 
Exponent/sign of result 
Store exponent/sign of result 
Advance pointer. 

Least significant' digit pair of result 


msb byte of result 
Store it 

Return if no error was detected 

Go report error (t.e. underflow or divide by 0) 



FLOATING POINT UNIT 


PRICE LIST 
MODEL 

1 

QUANTITY 

•25 

100 

n 

$595.00 

$535.00 

$475.00 

§2 

470.00 

4.25.00. 

375.00 


345.00 

315.00 

275.00 


A11 sales FOB Palo Alto 


EXECUTION TIMES 
FUNCTION 


TIME IN HILISECONDS (approximate) 


ADD,- SUB 


110 

MUL, DIV, SQRT 


225 

TAN 


846 

LN, SIN, C0S,h^P0L 


1250 

POWER 

* 

1720 


CVBERa'ETlC /DiCRQ 5V5TE^5 

2460 EMBARCADERO WAY 
PALO ALTO. CA 94303 
(415) 321-0410 




A]ppendix C 


0110 APL least squares program 



]• 


i] L.’^O t^5 K? li? K? Ivfi* l'!l * n j 1 1i t' ' ; ; ); ]• 


•J 

1 


f i: ]? FIRST Guess FOR ^;l 1 l FR .•osnion lo rriRiFoiriH nooRPimrF 

iH ] :? NUF1BER OF 1 1 1.' F r;l ) ui F i • I I 1 11 II 1 1 1 1 |i 1 fP’-'iR. ■; 

3 -FLXR-FL 

• I 

1 : 3 -l+l 


r I 

•^.1 

« 

rj • 

SJ 





i ] :Rl hCt £■ FROn POI FS rillH Miilir.i iu'i,]|i 1-1 IFT I'lrp ; |‘jE= EFlRFH 

: J ] i £• 1 1 J + 1 £ “7XFILX (£■ C3 J ~0 1 R I . U O 

. i ( (£Cn :*;£;i + (£ CRJ .*c'J + U": f ?] •'.Ri vK.i •':0. !;. 

r' 3: rl rt BV 3 IlMFRl:.’ OR --i 1 1 1: | |. j TE I •rr. J i ] Oi IS! y, riF UFLOririE' 
- '■ ■•£;.! r--e 
L'aR .1 Ati. 5 

' IS OCCTOR OF RRNGE RhTFS JTII NILEU SFll FU ] F£ fiMlJ POSITIOrl £ 

' £•='£*.• "^R 

=ii IS H coriPOHEur ofctor ijf iiijisuivEn rTiUGE rrifs 
. ■ RJin-RB 

I' F':--*' RhS— BlflS^ (+.- R£S!i ->pRF£ 

' I — H I + 1 J .'■■£MD 

r SRLCULhTE llflFRIX OF PFSJ.ni.li iL. FOUHTIOllS 


RFlB I US 


■; ’ or-py.i pR 

r ■ • . . :q I .}.pE.i pRIi^Ri'^S 

1 . ■ ‘ ■ i 1.1 priflu-t- (+.-• [ 1 ;j 1 i;i v i 1 1: n 

' oi .iM. ilihte sphfr I cfil-chr ffs 1 Fin i'isruft'ori m r i oi i : 

T '• ; £’ P (-£• Cl ] x£ C3J ;i (-pC.-’;i;i > (-L-’C£J>;PC3J | ?£C1.] 
* * - 

'RST SOijRRFS SOLUnOn or RESI.nURL FOl.liinONS 

• *Ti*^ 


ipciJA£;i fpcpjA 


• . -^REsan 

.'iPV 


‘Pus RES I BURLS: f RES*;c;i 

■ ri'.f: IS TRflMSFORriRTIOM |- Rurl CRRTESIRH FO GEOBET IC COORBIURTES 
' L.un ? HG F HKb : ' B r 0GB £ 

] non IS:’ 

l^p 


1 





ORIGINAL PAGE IS 
OF POOR QUALITY 


_ Bibliography 


Sterbeuz, Pat H. (1974-) , "Floating-Point Ocnputation" , 
Prentice-Hall, Inc. , Englewood Cliffs, N. J. 

Hashizuine, Burt (Nov. 1977), "Floating Point Arithn^tic", Byte, 
Vol. 2, NOi 11, pp. 76-78, 180-188. 

>farini, John W. (Oct. 1976) , "Initial Position Estimates for 
Satellite-Aided Search and Fescue", Goddard Space Flight 
Center, Greenbelt, Maryland. 



Microprocessor Utilization in Search & Besciae Missions 

FILIAL REPORT 


Introduction ; 

The position of an emergency transmitter nay be determined by neasuring 
the Doppler shift of the distress signal as received by an orbiting satellite. 
This requires the corputation of an initial estimate and refinement of this 
estiitabe through an iterative, nonlinear, least-squares estimation. 

A version of the above algorithm VTas inplenented at Goddard Space Flight 
Center (GSFC) and tested by locating a transmitter on the premises and obtaining 
observations from a satellite. The ODnputer \ised was an IBM 360/95'. The po- 
sition was determined within the desired 10 Jon radius accuracy. 

The purpose of this project is to determine the feasibility of performing 
the same task in real time using microprocessor technology. The least square 
algorithm \«as irrplenented on an Intel 8080 microprocessor and. the sane experi- 
ment was run as at GSFC- 

The results indicate that a micropixxiessor can easily match the IBM im- 
plerrentation in accuracy and be perfomed inside the time limitations set. 



t'Jhy Microprocessors ; 

Tine is an inplicit restriction in any searcii and rescue mission. The 
.use of satellites and ccirputers is dictated by that time limit. The use of 
a big coirputer to determine the position presupposes ccsTinunication beta'Te^ the 
satellite and tlie ccstputer. This comnunication introduces a time delay since 
the satellite is not always within radio visibility of an installation that 
possesses both the communication and ccitputing pov^r for this problem. Fur- 
thermore the result has to be forwarded to a coimEnd center to do the dispatching. 

Microprocessor utilization can alleviate this situation in two ways: 
by giving cheap ocaputing power to ccurmunication facilities or by Incorporating 
the conputing pcnijer in the satellite itself thus eliminating this oommunication. 
coirpletely. 

Micrcprocessors offer light weight/ small volume/ low pcwer processing. 

Their speed is iirproving rapidly and their cost is going down. They are the 
logical choice for a satellite search and rescue system if th^ can perform. 



Machine Configuration; 


Strictly speaking there are three micropijocessor configurations in 
this project v^ch are going to discuss individually. 

. Developsnent system 

• Mininal execution system 

* Actual field configuration 

Initially our development system consisted of an MDS-80 Intellec micro- 
coirputer by Intel v?ith 16k bytes of RAM nonory and a resident RDM monitor. 

Most of the floating point package was developed in-irachine language on that 
system using the itonitor's limited hexadeciml editor and debugger. Che need 
for more sophistication becaite apparent. After several failures in ej<ploring 
alternatives (as fancy as hooking vp to a PDP 11 through a telephone line 
for more storage) we were able to acquire a dual floppy di^ drive by Intel. 

A spare line printer was attached to the system with minor hardware modifica- 
tions and 16k bytes more RAM were added in order to support DCS. The enhanced 
system had the power of a xnini-corrputer in software (assembler, editor, li- 
brary rronager, linkage editor, leader, and a sufficient file manager) at a 
speed vdiich was slow but acceptable. The floating point package was converted 
to assembly language, and two more packages were developed; the I/O package 
and the natrix manipulation package. Unexpected help came from the use of 
ICE-80 (In Circuit Emulator) , designed for a different application, as a 
pxowerful symbolic debugger substituting for the nonitor hexadecimal debugger. 

Out of this final version of the developnent system only a limited 
amount of resources were used for the final run. Those define the minimal 
execution syston. The disk was only used for irput of data. The essential 
parts were: 

♦ Ihe CPU card 


• 16k ^tes of nemory 



• The console device and its interface 


• Power sijpply; 12V, 5V, -5V, groijnd 

Additionally, the line printer was used to produce a hardcopy version of 

\ 

the results. 

The actual field configuration would be the same if the machine were 
located on the ground. Soire kind of corrmunications eguiprtent vrauld be re- 
quired to provide the data input and, maybe, start the run autcanatically. 

The configuration would be different, though, if the machine were located on 
the satellite. The requireitents for the satellite configuration would be: 

• The CPU card 

• 16k bytes of memory 

An Interface that can load the informtion in memory 

• A means to comnunicate the result to the world 

• Pot^r sv^Jply; 12V, 5V, -5V, ground 



The Floating Point Package; 


Based on estirtates of the number of operations required we were in- 
clined to think that any floating point operations would have to be perforrred 
by hardware and not by software since estimated tines became prohibitive. 

This floating point package was developed to help us count the actual number 
of operations rather than' perform then in an actual situation. The final run 
proved our estimates wrong and the package gained new irrportance. 

There are a number of representations of floating poing numbers dif- 
fering in accuracy and range as a trade off to the number of bytes required 
per number. The one used was the ANSI format for POETRAN which happens to 
be irrplenented by hardware as an option in IBM coirputers. It consists of one 
sign bit, a seven bit ejjponent (excess 64 ) , and a 24 bit nentissa of hexa- 
decinal digits. The accuracy is 6 hexadecimal digits or approximately 7,2 
decinal digits. Specific operations were not timed although a more general 
timing analysis appears in a later section. This foritat vras chosen as opposed 
to the BCD fomat because the space requirements are laver fo r the same 
amount of precision, which in turn reduces execution time slightly. A man- 
tissa of binary digits was not used because of the frequent need for normali- 
zation. 

Addition and subtraction take exactly the same time, whereas multipli- 
cation is approximately equal to 22 addition and division is approximately 
60 additions. 

Multiplication produces a 48 bit result mantissa which is then normalized 

and rounded to 24 bits. This preserved- the number of significant digits, or, 

viewed fran a different angle, is the same as a doiable precision multiply if 

• • ** 

the two arguments were expanded w?ith zero fill. 

Division preserves the significant digits again by expanding the man- 
tissa of the dividend to double precision and results in full single preci- 
sion result. Nbrnalization and rounding occur as in multiplication. 



^curacy is thus preserved to true single precision throughout in a 
nuitierically stable itenner keeping the length of the nurrber to 4 bytes. The 
cost is ej<pensive multiplication and, ejqpeciallu, division. This dictates 
a prograimdng style viiereby division is avoided unless it is a£>solutely neces- 
sary. The benefits, on the other hand, are nunerically stable iirplementa- 
tions whose results match the double precision to the extent possible as 
will be seen when the resvilts of the run are analyzed. 

The square root function was inplemented by using a v^iation of Heron's 
formula based on the observation that the nentissa of any floating point num- 
ber will have a value of 1/16 to 1 (interpreted as a fraction) . As a first 
guess an approxiitation to a straight line connecting the two end points is 
made. Experimentally, six iterations were found necessary to produce an 
accurate result. A better first gress could inprove that significantly, but 
tine constraints did not allow us to pursue that direction. 

Finally, input and output of floating point numbers turn out a much 
more serious task than first expected. The input routine recognized numbers 
with a m axim um of ten integer and ten fraction digits. This proved more 

than sufficient for our needs . The output routine produces a, rigid scienti- 

\ 

i 

fic forrrat mth 10 fraction digits. When interpreting the results it should 
be kept in liind that at most only 7 are significant. The formt was re- 
tained in case of future expansion of the itBntissa. Both the input and output 
routines could be better, but since their function is only tangential to the 
project at hand they were kept on the bare functional level. 



Matrix Operations; 


All natrices in the system are defined as two. dimensional, including 
vectors. The first two bytes contain the number of rows and the nurrber of 

-I 

columns in the particular rtatrix, respectively. This effectively limits the 
number of observations to 256- Vectors have one of their dinensions identi- 
cally equal to 1. The next two bytes contain the address of the first byte 
that follows the last byte belonging to the matrix. Adjacent elements in 
a row of the matrix are stored as adjacent floating point numbers in nemory. 
Bows are stored sequentially starting from the first row in the fifth byte. 

In an effort to minimize the number of address calculations in the least 
squares algorithm the APL program v?e were si^splied with, (LSQ) , was converted 
into FORTRAN. The calculations involved in the residual equations vere all 
grouped together inside one big loop. The advantage of such a scheme is- that 
once an offset is calculated it can be used to address all the needed elements 
of the natxices involved in the calculation. When the time cane though, to 
inploivsnt it losing 8080 assembly language, it became all too apparent that 
there were too" many addresses to keep track of and too few registers to help. 
Therefore, due to the limitation of addressing capabilities, routines wero 
implerrented for the various matrix operators in APL. This resulted in well 
structured and very efficient code, the style being, dictated by the instruc- 
tion set. 

A ininimum number of matrix utility routines necessaiy. tfetrices can 
be created by speci.fying their dimensions, they can be filled with zeros, they 
can be read from a device, they can be moved (copied) in storage. 

There are four classes of operations by vhich itatrices may be altered 
involving the following arguments. 

• a constant and a matrix 

• a vector and a matrix 



• two natrices (pliis possibly a result matrix) 

■ one natrix (for exaitple, inversion) 

In our particular application there was only one inversion^ of a -2 by 2 
iTQtrix involved. A siitple algorithm derived from Euler's method is inple- 
mented using fixed pivots. Execution tire and temporary storage are opti- 
mized. 



IiTpIemeating the E^^^erinoit : 

Having developed the tools that were discussed in previous sections the 
actual iiTplemantation was straight forward. For reasons already mentioned a 
routine was written to match the LSQ routine* developed by Dr.' Marini almost 
statement ty statonent. The correspondence is indicated in the source pro- 
gram by keeping track of. the APL statenent numbers. The array names were 
kept the sane as much as possible and only one addi tional temporary matrix 
was required. The program was written for a maximum of 100 observations. 

All matrix operations as well as the square root keep track of the calls to 
the floating point routines. 

The vAxDle package makes limited use of two monitor routines, which can 
easily be eliminated. The reason they are there is because software was 
being developed in machine language and tlo ironitor provided a lot of needed 
help. So, essentially, LSQ can be run completely independently. 

The space requirements for this particular run was approximately 16k 
bytes, -out of which 4k could be in EDM and 12k in RAM. The exact nuEibers 
are as follows: 

Code: 3656 bytes 

Data: 10365 bytes 

Stack: 100 bytes (arbitrarily) 

Total : 14121 bytes 

Incorporated into the package were four counting routines that kept 
track of the number of additions, subtractions, multiplications and divisions 
required during each iteration. The re 3 ults will be analyzed in the next sec- 
tion. The actual implementation would not require these routines. Bie counting 
overhead to each arithmetic operation is approximately equal to half the time 
of an addition. 


* See Appendix C. 



Interpreting the Results: 


The final run converged and yielded five digits of accuracy. If conver- 
gence is defined as a ratio of two succesive KyiS residuals being close to 1 
(in absolute) it was attained at the ninth iteration to within^ 0.00001. 
Coirparing these results to the run at GSPC (run at doi±)le precision, or 16 
digits of accuracy) vje note the 5 digit accuracy of our result. 

Numerical analysis gives us enough tools to justi^ the loss of two signi- 
ficant digits in the course of the iterations. The main source of error ap- 
pears to be the subtraction of the estimated range rates from the actuals. 

The subtraction of the average residval equations could could contribute to 
the error as well. 

The measured ej^cution time for this particular run was 62 seconds per 
iteration. The microprocessor used was an 8080A by Intel. Adjusting for 
counting the number of operations the true time becomes 61 seconds. The 
8080A CPU has a cycle time of 2 microseconds. ' If this system were actually 
iiTplemented, the 8080A-1 CPU' could be used vhich offers higher speed wi'th 
cycle time of 1.3- microseconds which could bring execution time down to 40 
seconds for each iteration giving approximately 6 minutes to reach conver- 
gence. This figure is derived with no modification of 'the software. Since , 
it falls within our difinition of "real time", which was around 15 minutes, 

t 

i 

r 

it is definitely a workable solution. 

Another alternative is, of course, to use hardware floating point units, 
units that we are familiar wi'th indica-ts a disparity in executicai 'times 
of several orders of magnitude. Their specifications appear in Appendix B 
for the purposes of -the following. analysis, '-typical' execution times for -8 
digits of precision of 'the North Star Computers, 3hc. FPB unit wiere used. 

Our sys-tem indicated -the following frequency of floating point opera-hions 
for each iteration: 


Additions - 3137 



Subtxactions - 672 


Multiplications - 2382 
Divisions - 940 

When trying to carpute the time it would tale to e}«cute ‘those instruc- 
tions we noticed that the tiire it takes to access hardwa^ce floating point 
unit is more than twice than the time it takes to do the calculations. Namely, 
we came up with the following numbers: 

TIME (SEC) PURPOSE 

0.35 perform the operations 

0.825 input and output the number form 

PPB (8080A-1) 

1.175 total time required 

Therefore, use of hardware units irake it possible to decrease the ejcecu- 

tion tin^ by one order of magnitude. 



Future Eesearch! 


The paranEters that have to be optimized in the search and rescue 
mission consist of the accuracy of the position estiitation and the tirte in 
v^ich it is performed. Proving 'the feasibility of a micropixjcessor ' 
iitplonentation is far fron devising an optimal algorithm. 

If the nonlinear regression method is utilized there is a lot of 
room for inprovement 'in the inital estimate, a quantity that can affect 
the \diole outcome of the iterations. Several irethods that are', suggested 
in Dr. Marini's paper can be explored. Furthermore, since the data 
collection takes an appreciable amount of time an algorithm should be devised 
in which an estimate is upgraded with each incoming datum. If that algorithm 
is good enough then the estimate could be, the result itself. 

A further enhancement on the calculation ti^ can be achieved through 
parallelism. It can appear on two levels; 

• The impleroentation of the least squares algorithm 

• The groiping of data 

The least s.quares algorithm my be broken into parallel subtasks that 
can be performed by different processors in parallel, especially floating 
point operations. 

The data my be groiped in clusters on which the least squares 
algorithm is applied. The estimte provided by each cluster is then processed 
through least squares estimtion itself. This mthod could be applied at 
data collection'- tine too. 
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Appendix B 


Two typical hardware floating point units 

' FPB.by North Star Ccoputers, Inc. 

• FPU by Cyberuetic Micro Systems 



FPB DATA SHEET 


EXECUTION TIMES 1,2,3. 


PRECISION DIGITS: 

2 

4 

6 

8 

10 

12 

14 

ADD best 

■■ 

1 

1 

1 

1' 

1 

1 

typical 

mM 

8 

9 

9 

10 

10 

mm 

worst 

■■ 

10 

10 

11 

11 

12: 


SUBTRACT best 

4 

4 

4 

4 


■I 


typical 

8 

8 

9 

9 

H9 

mm 

Hi 

worst 

15 

16 

17 

18 

mM 

mM 


MULTIPLY best 

5 

n 

5 

5 

5 

5 

5 

typical 

18 


55 

80 

111 

146 

186 

worst 

51 


228 

382 

527 

720 

933 

DIVIDE best 

7 

7 

7 

7 

7 

7 

7 

typical 

39 

70 

109 

156 

211 

274 

370 

worst 

62 

139 

229 

340 

470 

621 

779 


1. Times given in microseconds 

2. Execution times are a function of the input values 

3. Times listed do not include transmission of input values and result 


Board dimensions: 

Model A: Bin. by 10 in. 

Model B: 6% in. by 1 2 in. 

Power requirements: 

Model A: 8 V (unregulated) @ 1.7 A 
Model B; 5 V (regulated) @ 1,7 A 

Board Construction: 

FR4 material, gold plated edge connectors 
number representation: 

bit 7=sign (1=negative number, 0=positive number) 
bits 6-0 = exponent in excess 64 binary representation 
bits 7-0 = zero represents the zero value 
bits 3-0 = least significant digit of valtje in BCD coding 
bits 7-4 = next least significant digit of value 

Byte n: bits 7-4 = most significant digit of value in BCD coding 
bits 3-0 = next most significant digit of value 

All values are nomalized. 

. . 

Other representations of BCD floating point numbers require a change in microcode and are available on 
special order. 


Floating ptoint 
Byte 1 : 


Byte 2: 








































*Samp!e use of the North Star FPB for a divide operation with 8 digit precision 
^In this example assume arguments aie in memory in form: 

.* Most significant byte (msb) digit pair 

* Susequent digit pairs follow the msb 

* Exponent/sign byte follows Isb digit pair. 

* Pointer addresses the exponent/sign byte 
*BC has left arg pointer 

*DE has right arg pointer 
*HL has result pointer 


*The FPB receives its arguments by "peeking" at the 8080 bus 
*when the argument values are loaded to accumulator. 

*Two jumperable "hardwired" addresses are required for signaling the FPB 

*This routine may be generalized to perform any operation, at any precision. 


FDIV LDA RSTRT 

MVl A.8M6+DIVOP 

LDAX D 

OCX D 

LDAX D 

DCX D 

LDAX D 

DCX D 

LDAX D 

DCX D 

LDAX D 

LDAX B 

DCX B 

LDAX B 

DCX B 

LDAX B 

DCX B 

LDAX B 

DCX B 

LDAX B 

Now the Floating Point 
LXl D,FPD!N 
FDIVl LDAX D 
ORA A 
JP FDIVl 
AN! EBITS 
LDAX D 
MOV M,A 
DCX H 
LDAX D 
MOV M,A 
DCX H 
LDAX D 
MOV M.A 
DCX H 
LDAX D 
MOV M,A 
. DCX H 
LDAX D 
MOV M,A 
RZ 

JMP ERROR 


This "hardwired" reference signals FPB to "wake up' 

Specify precision and operation code to FPB 

Exponent/sign byte of right arg 

Advance pointer to next byte 

Least significant digit pair of right arg 

Advance pointer to next byte 


Most significant digit pair of right arg 
Exponent/sign byte of left arg 

Least significant digit pair of left arg 


Most significant digit pair of left arg 
Board is performing the operation 
"Hardwired" address for receiving value from FPB 
Loop waiting for completion signal (sign bit) 

The FPB is done when the sign bit becomes "1" 
Loop if sign bit is still "0" 

Check for error, condition tested at end 
Exponent/sign of result 
Store exponent/sign of result 
Advance pointer. 

Least signific'ant'digit pair of result 


msb byte of result 
Store it 

Return if no error was detected 

Go report error (i.e. underflow or divide by 0) 



FLOATING POINT UNIT 


PRICE LIST 

MODEL QUANTITY 

1 -25 100 

#1 $595.00 $535.00 $475.00 

#2 470.00 425.00 375.00 

#3 345.00 315.00 275.00 

All sales FOB Palo Alto 


EXECUTION TIMES 


FUNCTION TIME IN MILISECONDS (approximate) 


ADD, SUB 

110 

MUL, DIV, SQRT 

225 

TAN 

846 

LN, SIN, COS,~>POL 

1250 

POWER 

1720 


CVBESiniETJC /DiCRO 5^5TEXB5 

2460 EMBARCAOERO WAY 
PALO ALTO, CA 94303 
C415) 321-0410 






Appendix C 


The APL least squares program 
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Microprocessor Utilization in Search & Eescue Missions 

FINAL FEPORT 


Introduction ; 

The position of an emergency transmitter may be determined by measuring 
the Doppler shift of the distress signal as received by an orbiting satellite. 
This requires the carputation of an initial estimate and refinenent of this 
estinate through an iterative, nonlinear, least-squares estimation. 

A version of the above algorithm was inplemented at Goddard Space Flight 
Center (GSEC) and tested by locating a transmitter on the premises and' obtaining 
observations from a satellite. The coirputer used was an IBM 360/95. Cie po- 
sition was determined mthin the desired 10 km radius accuracy. 

The purpose of this project is to determine the feasibility of performing 
the sane task in real time using microprocessor technology. The least square 
algorithm was inplemented on an Intel 8080 microprocessor and the same ejperi- 
msnt was run as at GSFC. 

The results indicate that a microprocessor can easily match the IBM im- 
plenentation in accuracy and be performed inside the time limitations set. 
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Why Microprocessors ; 

Tim is an inplicit restriction in any search and rescue mission. The 
use of satellites and conputers is dictated. by that time limit. The use of 
a big ooitputer to determine the position presupposes ccamiunication between the 
satellite and the conputer. This communication introduces a time delay since 
the satellite is not always within radio visibility of an installation that 
possesses both the comnunication and coiputing power for this problem. Fur- 
thermore the result has to be forwarded to a command center to do the dispatching. 

Microprocessor utilization can alleviate this situation in two mys; 
by giving cheap ccarputing power to communication facilities or by incorporating 
the conputing power in the satellite itself thus eliminating this communication 
completely. 


Microprocessors offer light weight, small volume, low power processing. 
Their speed is iirproving rapidly and their cost is going down. They are the 
logical choice for a satellite search and rescue system if they can perform. 
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Machine Configuration ; 

Strictly speaking there are three microprocessor configurations in 
this project \^ch we are going to discuss individually. 

• Developsnent system 

• Minirtal execution system 

• Actual field configuration 

Initially our development system consisted of an lOS-80 Intellec micro- 
coiiputer ty Intel with 16k bytes of RAM memory and a resident RDM monitor. 
jXbst of the floating point package was developed in machine language on that 
system using the monitor’s limited hexadecimal editor and debugger. The need 
for more sophistication became apparent. After several failures in exploring 
alternatives (as fancy as hooking up to a PDF 11 through a telephone line 
for more storage) we were able to acquire a dual floppy disk drive by Intel. 

A spare line printer was attached to the system with minor hardware modifica- 
tions and 16k bytes more RAM were added in order to support DOS. The enhanced 
system had the power of a mini-conputer in software (assembler, editor, li- 
brary manager, linkage editor, leader, and a sufficient file nanager) at a 
speed which was slow but acceptable. The floating point package was converted 
to assembly language, and two more packages were developed: the I/O package 

and the matrix manipulation package. Unexpected help came from the use of 
ICE-80 (In Circuit Emulator) , designed for a different application, as a 
powerful syinbolic debugger substituting for the monitor hexadecitral debugger. 

Out of this final version of the development ^stem only a limited 
amount of resoxirces were used for the final run. Those define the minimal 
execution system. The disk was only used for input of data. The essential 
parts werez 

The CPU card 


16k Eytes of memory 
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• The console device and its interface 

• Power supply: 12V, 5V, -5V, grouind 

Mditionally, the line printer was used to produce a hardcopy version of 
the results. 

The actual field configuration would be the sane if the mchine were 
located on the ground. Sone kind of comnunications equipitient vrould be re- 
quired to provide the data input and, neybe, start the run automatically. 

The configuration would be different, though, if the machine were located on 
the satellite. The requirements for the satellite configuration would be: 

• The CPU card 

• 16k bytes of memory 

An interface that can load the information in memory 

• A means to coitinunicate the result to the world 

• Power supply: 12V, 5V, -5V, ground 
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The Floating Point Package ; 

Based on estimates of the number of operations required we were in- 
clined to think that any floating point operations would have to be performed 
by hardware and not by software since estimated times became prohibitive. 

This floating point package was developed to help us count the actual number 
of operations rather than perform them in an actual situation. The final run 
proved our estimates wrong and the package gained new ircportance. 

There are a number of representations of floating poing numbers dif- 
fering in accuracy and range as a trade off to the number of bytes required 
per number. The one used was the i®SI format for FORIKVN which happens to 
be iirplemented by hardware as an option in IBM computers. It consists of one 
sign bit, a seven bit exponent (excess 64) , and a 24 bit mantissa of hexa- 
decimal digits. The accuracy is 6 hexadecimal digits or approximately 7.2 
decimal digits. Specific operations were not timed although a more general 
timing analysis appears in a later section. This format was chosen as opposed 
to the BCD format because the space requirements are louver for the saire 
amount of precision, vhich in turn reduces execution time slightly. A man- 
tissa of binary digits was not used because of the frequent need for normali- 
zation. 

Addition and si±)traction take exactly the same time, whereas multipli- 
cation is approximately equal to 22 addition and division is approximately 
60 additions . 

Multiplication produces a 48 bit result mantissa which is then normalized 
and rounded to 24 bits. This preserved ■ the number of significant digits, or, 
viewed from a different angle, is the same as a double precision multiply if 
the two arguments were e>panded with zero fill. 

Division preserves the significant digits again by eipanding the man- 
tissa of the dividend to double precision and results in full single preci- 
sion result. Iformalization and rounding occur as in multiplication. 
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Accuracy is thus preserved to true single precision throughout in a 
numerically stable rtenner keeping the length of the nuiriber to 4 bytes. The 
cost is ej^Jensive multiplication and, expeciallu, division. This dictates 
a programming style whereby division is avoided unless it is absolutely neces- 
sary. The benefits, on the other hand, are numerically stable irrplementa- 
tions whose results match the doiiile precision to the extent possible as 
will be se^ when the results of the run are analyzed. 

The square root function was inplemsnted by using a variation of Heron's 
formula based on the observation that the mantissa of any floating point num- 
ber will have a value of 1/16 to 1 (interpreted as a fraction) . As a first 
guess an approximation to a straight line connecting the two end points is 
nade. Experitten tally, six iterations were found necessary to produce an 
accurate result. A better first guess could inprove that significantly, but 
time constraints did not allow us to pursue that direction. 

Finally, irput and output of floating point nunfoers turn out a much 
more serious task than first expected. The input routine recognized numbers 
with a maximijm of ten integer and ten fraction digits. This proved more 
than sufficient for our needs. The output routine produces a rigid scienti- 
fic format with 10 fraction digits. When inteipreting the results it should 
be kept in mind that at most only 7 are significant. The format was re- 
tained in case of future expansion of the mantissa. Both the input and output 
routines could be better, but since their fmction is only tangential to the 
project at h^d they were kept on the bare functional level. 
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Matrix Operations ; 

All natrices in the system are defined as two dimensional, including 
vectors. The first two bytes contain the number of rows and the nuiiber of 
columns in the particular matrix, respectively. This effectively limits the 
number of observations to 256. Vectors have one of their dimensions identi- 
cally equal to 1. The next two bytes contain the address of the first byte 
that follows the last byte belonging to the matrix. Adjacent elements in 
a row of the matrix are stored as adjacent floating point numbers in irerao^. 
Rows are stored sequentially starting from the first row in the fifth byte. 

In an effort to minimize the number of address calculations in the least 
squares algorithm the APL program we were supplied with, (LSQ) , was converted 
into FORTRAN. The calculations involved in the residual equations were all 
grox$)ed together inside one big loop. The advantage of such a scheme is that 
once an offset is calculated it can be used to address all the needed elements 
of the matrices involved in the calculation. When the time came though, to 
inplement it using 8080 assembly language, it became all too. apparent that 
there were too many addresses to keep track of and too few registers to help. 
Therefore, due to the limitation of addressing capabilities, routines vere 
implemented for the various matrix operators in APL. This resulted in well 
structured and very efficient code, the style being dictated by the instruc- 
tion set. 

A ndnimum number of matrix utility routines was necessary. Matrices can 
be created by specifying their dimensions, they can be filled with zeros, they 
can be read from a device, they can be moved (copied) in storage. 

There are four classes of operations by ■which matrices may be altered 
involving the following arguments. 

• a constant and a matrix 


• a •vector and a ma-trix 
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• two inatrices (plus possibly a resiiLt natrix) 

- one matrix (for exairple, inversion) 

In our particular application there was only one inversion of a 2 by 2 
natrix involved. A siirple algorithm derived from Euler's method is inple- 
mented using fixed pivots. Execution time and teirporary storage are opti- 


mized. 
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I^leEnenting the E^eriment ; 

Having developed the tools that were discussed in previous sections the 
actual implementation was straight forward. For reasons already itentioned a 
routine was written to natch the LSQ routine* developed by Dr. Marini alitost 
statement by statement. The correspondence is indicated in the source pro- 
gram by keeping track of the fiPL statement numbers. The array names were 
kept the same -as much as possible and only one additional teirporary matrix 
was required. The program was written for a maxitnum of 100 observations. 

All mtrix operations as well as the square root keep track of the calls to 
the floating point routines. 

The whole package nakes limited use of two monitor routines, which can 
easily be eliminated. The reason they are there is because software was 
being developed in machine langmge and the monitor provided a lot of needed 
help. So, essentially, LSQ can be run cortpletely independently. 

The space requirenents for this particular run vras approximately 16k 
bytes, out of which 4k could be in FDM and 12k in RAM. The exact nunhers 
are as follows: 


Code: 

3656 bytes 

Data: 

10365 bytes 

Stack: 

100 bytes (arbi-fcrarily) 

Tbtal: 

14121 tytes 


Incorporated into the package wore four counting routines that kept 
track of the number of additions, subtractions, multiplications and divisions 
required during each iteration. The results will be analyzed in the next sec- 
tion. The actual inplementation would not require these routines. The counting 
overhead to each arithmetic operation is approximately equal to half the time 
of an addition. 


* See i^pendix C, 
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Interpreting the Rssul-ts ; 

The final run converged and yielded five digits of accuracy. If conver- 
gence is defined as a ratio of two succesive IMS residuals being close to 1 
(in absolute) it was attained at the ninth iteration to within 0.00001. 
Ctoitparing these results to the run at GSFC (run at double precision, or 16 
digits of accuracy) ve note the 5 digit accuracy of our result. 

Numerical analysis gives us enough tools to justify the loss of two signi- 
ficant digits in the course of the iterations. The main source of error ap- 
pears to be the subtraction of the estiitated range rates^ from the actuals . 

The subtraction of the average residual equations could could contribute to 
the error as well. 

The measured execution time for this particular run was 62 seconds per 
iteration. The microprocessor used was an 8080A by Intel- Adjusting for 
counting the number of operations the true tiire becomes 61 seconds. The 
8080A CPU has a cycle tine of 2 microseconds. If this system were actually 
iiTplemented, the 8080A-1 CPU' could be used which offers higher speed with 
cycle time of 1.3 microseconds which could bring execution tiire down to 40 
seconds for each iteration giving approximately 6 minutes to reach conver- 
gence. This figuiB is derived wi-th no modifica-tion of the software. Since 
it falls within our difinition of "real time", which was around 15 minutes, 
it is definitely a workable solution. 

Another alternative is, of course, to x:ise hardware floating point units. 
Two units "that we are familiar wi'th indicate a disparity in ejscution ■times 
of several orders of magnitude. Their specifications appear in Appendix B 
for the purposes of the following analysis, 'typical' execution -times for 8 
digits of precision of the North Star Conputers, Inc. FPB unit were used. 

Our system indicated the following frequency of floating point operations 
for each itera-hion: 


Addi-tions - 3137 
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Subtractions - 672 
Multiplications - 2382 
Divisions - 940 

When trying to corrpute the time it would take to execute those instruc- 
tions we noticed that the time it takes to access hardware floating point 
unit is more than twice than the time it takes to do the calculations. Namely, 
we cane ijp with the following numbers: 

TIME (SEC) PtJHPOSE 

0.35 perform the operations 

0.825 input and output the number form 

PPB (8080A-1) 

1.175 total time required 

Therefore, use of hardware units make it possible to decrease the ej^cu- 

tion time by one order of magnitude. 
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Future Eesearch ; 

The parameters that have to be optimized in the search and rescue 
mission consist of the accuracy of the position estimtion and the time in 
viiich it is performed. Proving the feasibility of a microprocessor ‘ 
implementation is far fron devising an optimal algorithm. 

If the nonlinear regression method is utilized there is a lot of 
room for inprovement in the inital estimate, a quantity that can affect 
the whole outcane of the iterations. Several methods that are. suggested 
in Dr. Marini's paper can be explored. Furthermore, since the data 
collection takes an appreciable amount of time an algorithm should be devised 
in which an estimate is i:pgraded with each incoming datum. If that algorithm 
is good enough then the estimate could be the result itself. 

A further enhancement on the calculation tiem can be achieved through 
parallelism. It can appear on two levels: 

• The implementation of the least squares algorithm 

• The grorping of data 

The least s,quares algorithm may be broken into parallel subtasks that 
can be performed by different processors in parallel, especially floating 
point operations. 

The data may be groiped in clusters on which the least .squares 
algorithm is applied. The estimate provided by each cluster is then processed 
through least squares estimation itself. This method could be applied at 
data collection i time too. 



i^pendix A 


Sairple run at GSFC 
Sairple 2xin at Columbia 
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i 


sarrple run at Columbia 
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RMS RESK-URLS = 0 . li 54 t" 154 G 2 E +01 


HE RESULTING POSITION IS: 

-K= 0 . 62 -i 6 ? 47 S 55 E +03 V= -6 4 ?:l 2 ' 5 i 7 O 47 E +04 

! 

^^ 5 ~F'ESIC.:iJRLS - “ 0 ”iti 50 SS 20 S 7 E+ 0 ± 


'■’Lt. 7 t't' 2 ."*.’&*r* 2 E '^04 


rHE"PESULTING POSITION IS: 

■<:= 0 846458721 iE +02 Y- -O 4 S::' 9673 R? 5 E +04 

RMS“RESK'URLS ' = "' 0 ~ 13 S? 477723 E +00 


tHE'F'HSOLTlNO POSITION IS, 

,>,™ G 8‘U* r 0 J. c-5E"^0j: V*" — 0. *f'*8 _-8*7' •4*f'“^*,'*E"*'^ !*■! 

I 

RHS~RESIDURL'S"= “ '0. 070640211:1 E+60 
! 

I 

I 

tHE“T?ESLltrrTHG~PUSTTiriF-r 1ST' 

X= 0. 9S751506S0E+02 V- -O. 4S?8407516F+O-i 

RMS PESIDURLS '= 0. 065O796T1SE+DO 

THE“RESUL7riHG POSITION IS: 

0. S99S647140E+O2 -0. 482 83789066+04 

RMS* RESIDLIRLS = 0. 0647644424F+0O 
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THE RESULTING POSITION IS 
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THE“RESULTTf?G POSITION IS: 
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THE RESULTING POSITIOLF IS' 
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RMS RESK'URLS 


0 . 06474324 44 F +00 



Apperdix B 


'Dwo typical hardware floating point units 

* FPB by North Star Ctsrputers, Inc. 

• FPU by Cyberuetic Micro Systems 
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FPB DATA SHEET 
EXECUTION TIMES 1.2,3 


PRECISION DIGITS; 

2 

1 

i 

1 

6 

8 

10 

12 

14 

ADD best 

1 

-1 

1 

1 

1 

1 

■1 

typical 

8 

8 

9 

9 

10 

10 

Bl 

worst 

10 

10 

10 

11 

n 

12 


SUBTRACT best 

4 


■ 4 


■I 

4 

■1 

typical 

8 


9 


■■ 

10 

Bfl 

worst 

15 


17 


mm 

20 

HI 

MULTIPLY best 

b 

5 

5 

5 

5 

— 

5 

typical 

18 

34 

55 

80 

111 


186 

worst 

51 

125 

228 

382 

527 

720 

933 

DIVIDE best 

7 

7 . 

7 

7 

mm 

7 

7 

typical 

39 

70 

109 

156 


274 

370 

worst 

62 

139 

229 

340 


621 

779 


1. Times given in microseconds 

2. Execution times are a function of the input values 

3. Times listed do not include transmission of input values and result 


Board dimensions: 

Model A: Bin. by 10 in. 

Model B; 6% in. by 1 2 in. 

Power requirements: 

Model A: 8 V (unregulated) @ 1 .7 A 
Model B: 5 V (regulated) @ 1 .7 A 

Board Construction: 

FR4 material, gold plated edge connectois 

Floating point numbei representation: 

Byte 1: bit 7=sign (1=negative number, 0=positive number) 

bits 6-0 = exponent in excess 64 binary representation 
bits 7-0 = zero represents the zero value 
Byte 2: bits 3-0 = least significant digit of value in BCD coding 
bits 7-4 = next least significant digit of value 

Byte n: bits 7-4 = most significant digit of val,ue in BCD coding 
bits 3-0 = next most significant digit of value 

All values are nomalized. 

Other representations of BCD floating point numbers require a change in microcode and are available on 
special order. 





































19 


*Sample use of the North Star FPB for a divide opoiation with 8 digit precision 
*ln this example assume arguments ate in memory in form: 

* Most significant byte (msb) digit pair 

* Susequent digit pairs follow the msb 

* Exponent/sign byte follows isb digit pair. 

* Pointer addresses the exponent/sign byte 
*BC has left arg pointer 

*DE has right arg pointer 
*HL has result pointer 

*The FPB receives its arguments by "peeking" at the 8080 bus 
*when the argument values are loaded to accumulator. 

*Two jumperable "hardwired" addresses are required for signaling the FPB 

♦This routine may be generalized to perform any operation, at any precision'. 


FDIV LDA RSTRT 

MVI A,8*16+DIVOP 

LDAX D 

DCX D 

LDAX D 

DCX D 

LDAX D 

DCX D 

LDAX D 

DCX D 

LDAX D 

LDAX B 

DCX B 

LDAX B 

DCX B 

LDAX B 

DCX B 

LDAX B 

DCX B 

LDAX B 

Now the Floating Point 
LXl D,FPDIN 
FDIVI LDAX D 
ORA A 
JP FDIVI 
AN! EBITS 
LDAX D 
MOV M,A 
DCX H 
LDAX D 
MOV M,A 
DCX H 
LDAX D 
MOV M,A 
DCX H 
LDAX D 
MOV M,A 
DCX H 
LDAX D 
MOV M,A 
RZ 

JMP ERROR 


This "hardwired" reference signals FPB to "wake up" 

Specify precision and operation code to FPB 

Exponent/sign byte of right arg 

Advance pointer to next byte 

Least significant digit paii of right arg 

Advance pointer to next byte 


Most significant digit pair of right arg 
Exponent/sign byte of left arg 

Least significant digit pair of left arg 


Most significant digit pair of left arg 
Board is performing the operation 
"Hardwired" address for receiving value from FPB 
Loop waiting for completion signal {sign bit) 

The FPB is done when the sign bit becomes "1" 
Loop if sign bit is still "0" 

Check for error, condition tested at end 
Exponent/sign of result 
Store exponent/sign of result 
Advance pointer. 

Least significanfdigit pair of result 


msb byte of result 
Store it 

Return if no error was detected 

Go report error (i.e. underflovv or divide by 0) 
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FLOAT ING POINT UNIT 


PRICE LIST 
MODEL 

1 

QUANTITY 

25 

100 

#1 

$595.00 

$535.00 

$475.00 

#2 

470.00 

425.00 

375.00 

#3 

345.00 

315.00 

275.00 


All sales FOB Palo Alto 


EXECUTION TIMES 


FUNCTION TIME IN MILISECONDS (approximate) 


ADD, SUB 

110 

MUL, DIV, SQRT 

225 

TAN 

846 

LN, SIN, COS.-^J-POL 

1250 

POWER 

1720 


CVaERfUETiC 5V5TEZD5 

2460 EMBARCADERO WAY 
PALO ALTO, CA 94303 

f415) 321-0410 
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Appendix C 

The APL least squares program 
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